Time-distributed ECC scrubbing to correct memory errors

ABSTRACT

Error correction circuitry attempts to detect and correct on the fly erroneous words within random access memory (RAM) within a computer system. RAM errors are scrubbed or corrected back in the memory without delaying the memory access cycle. Rather, the address of the section or row of RAM that contains the correctable error is latched for later used by an interrupt-driven firmware memory-error scrub routine. This routine reads and rewrites each word within the indicated memory section--the erroneous word is read, corrected on-the-fly as it is read, and then rewritten back into memory correctly. If the size of the memory section exceeds a predetermined threshold, then the process of reading and re-writing that section is divided into smaller sub-processes that are distributed in time using a delayed interrupt mechanism. Duration of each memory scrubbing subprocess is kept short enough that the response time of the computer system is not impaired with the housekeeping task of scrubbing RAM memory errors. System management interrupts and firmware may be used to implement the memory-error scrub routine, which makes it independent of and transparent to the various operating systems that may be run on the computer system.

FIELD OF THE INVENTION

The present invention relates to error correction within digitalcomputer systems. In particular, it relates to scrubbing, or correcting,errors in memory in a time-distributed manner.

BACKGROUND OF THE INVENTION

When data is read back from a memory in which it has been stored, itoccasionally happens that an error occurs, i.e. that the data read backis not identical to the data previously stored.

A number of error correcting codes (ECC) are known in the prior art thatare capable of not only detecting but also correcting errors. Typically,these codes can detect a broader range of errors than they can correct.For example, a DED-SEC code is capable of detecting any double errorsthat occur within the data field the code covers (i.e. errors in whichtwo bits within the field are erroneous) and of correcting any singleerrors (i.e. only one wrong bit).

As applied to main memory or Random Access Memory (RAM) within acomputer system, it may be desirable to consider each 64-bit double wordas its own data field, i.e., to store along with it its own ECC orredundancy check information. As the computer system reads words frommemory, this ECC information would be checked so that errors in the wordcould be detected and hopefully corrected.

If the ECC hardware detects a correctable error, then it is desirable tocorrect the word being read on-the-fly so as to provide the processor orI/O controller that is reading main memory with a corrected word. Thisis a performance critical task because accessing main memory is one ofthe most performance-critical aspects of computer system design. Anyimprovement or degradation in the latency between an access request andthe delivery of the data requested often has a substantial effect onoverall system performance.

It is further desirable to correct the word in main memory becauseerrors accumulate over time. If subsequent errors occur within the sameword, then they may convert a correctable error into an un-correctableerror. The process of correcting the data stored in memory is calledscrubbing the memory. Compared with the on-the-fly correction describedabove, the process of correcting the data stored in main memory is moretime consuming and more costly in terms of requiring additional hardwareand/or software to implement it.

In one approach to scrubbing memory, it is desired to not impose any ofthe error correction task on software. In this case, it would bedesirable to include in the memory controller a state machine thattemporarily suspends the normal operation of the memory and writes thecorrected word back to the erroneous memory location. Disadvantages ofthis approach include both the complexity of the hardware that would berequired to do the write back and the performance penalty because thememory would not be accessible for other purposes until the correct andre-write process is completed.

In another approach to scrubbing memory, it is desired to keep hardwarecosts and complexity at a minimum and impose most of the errorcorrection task on software. Such an approach would find it desirable togenerate an interrupt to activate software or firmware, executing on theprocessor, to correct the erroneous memory location. Unfortunately, insome systems the limited number of interrupt request signals or vectorsthat are available are already utilized. Also, a different version ofthe correction routine may be required for each different operatingsystem that will be run on the computer system.

SUMMARY OF THE INVENTION

A computer system includes a processor and a memory with errorcorrection capabilities. The memory is partitioned into sections.

When a controller for the memory determines that a memory word containsa correctable error, it indicates to the processor, via an interrupt,the section of memory to which the erroneous word belongs. In response,the processor reads and rewrites each word within that section of thememory. The interrupt mechanism used is distinct from that used forinput/output interrupts.

In some embodiments, the memory controller generates the errorcorrection check bits when data is written to the memory. In someembodiments, the memory controller corrects the memory data as it isread from the memory into the processor. In some embodiments, theaddress space, processor state and register set used by the processorfor the reading and re-writing process is distinct from that used duringnormal processor operation and distinct from that used for input/outputinterrupts.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated in the following drawings, in whichknown circuits are shown in block-diagram form for clarity. Thesedrawings are for explanation and for aiding the reader's understanding.The present invention should not be taken as being limited to thepreferred embodiments and design alternatives illustrated.

FIG. 1 shows the components of the error correcting scrubber of thepresent invention and their interconnections

FIG. 2 shows the time sequence of the scrubbing operation of the presentinvention and its associated control signals.

FIG. 3 shows how one embodiment interleaves check bits and data bitswithin a memory word and how it divides RAM into N sections of foursubsections each.

FIG. 4 shows the sequence of steps that the system management firmwaregoes through to scrub memory errors.

FIG. 5 shows the relationship, according to one embodiment of thepresent invention, among the system management firmware and thesoftware, hardware and basic input/output system (BIOS) firmware of anexample computer system.

FIG. 6 shows the error correction matrix used in one embodiment of thepresent invention. This matrix determines both how the error check bitsare generated from the data bits and how a single erroneous bit can beidentified from the syndrome bits.

DETAILED DESCRIPTION OF THE INVENTION

Architecture

Disclosed herein are various alternative embodiments and designalternatives of the present invention which, however, should not betaken as being limited to the embodiments and alternatives described.One skilled in the art will recognize alternative embodiments andvarious changes in form and detail that may be employed while practicingthe present invention without departing from its principles, spirit orscope.

The present invention is a method and apparatus for correcting erroneouswords within a computer system memory. FIG. 1 is a system architecturediagram of the memory and processor portion of a computer system thatuses a RAM scrubber according to one embodiment of the presentinvention.

As words are read from memory, ECC circuitry attempts to detect and, ifpossible, proceeds to correct errors on the fly, i.e. before they areprovided to the requester.

If a correctable error occurs, the ECC circuitry performs the correctionand provides the corrected data word to the requester. In oneembodiment, a one-cycle delay in memory access latency accommodates theerror correction process.

The ECC circuitry scrubs, or corrects, the errors in RAM withoutdelaying the memory access cycle to do so. The corrected word is notimmediately rewritten into RAM. Rather, an indication of the section ofRAM in which the error occurred is latched for later use by a firmwarememory scrubbing routine.

In one embodiment, the ECC circuitry does not require that the full wordaddress of the erroneous word be latched. Rather, in order to reducehardware cost and complexity, substantially fewer bits than a full wordaddress are stored--the stored bits indicating only which section ofmemory contains the erroneous word. In one embodiment, each sectioncorresponds to a memory row and the row address is latched to indicatethe section to be scrubbed.

The section address is provided to an interrupt-driven firmware routinethat scrubs that section, i.e. that reads and rewrites each memory wordwithin that section. This ensures that the erroneous word is read,corrected on-the-fly as it is read, and then rewritten. When the word isrewritten back into memory, it is stored correctly. This is desirablebecause errors accumulate over time and a second error within the samememory word is likely to make that word uncorrectable.

If the size of a memory section exceeds a predetermined threshold, thenthe process of scrubbing that section is divided into smallersub-processes. These sub-processes are distributed in time using delayedinterrupts. By keeping the duration of each subprocess below athreshold, the ECC circuitry ensures that the response time of thecomputer system is not significantly impaired by the housekeeping taskof scrubbing memory errors.

If an un-correctable error occurs, the ECC circuitry generates asoftware interrupt. Often, such an error is not recoverable and theprocess executing must be aborted or the system must be re-booted.

In one embodiment, system management interrupts and firmware providethis memory error scrubbing in a manner that is independent of andtransparent to the operating system running on the computer system.System management interrupts (SMIs) occupy an interrupt vector spacethat is independent of that of regular interrupts, such as input/outputinterrupts. System management interrupt service routines execute in aprogram address space that is independent of that of regular programexecution and of that of regular interrupts. System management interruptservice routines make use of processor state information that isindependent of that used for regular program execution and of that usedfor regular interrupts.

In this embodiment, there are no conflicts or contention for interruptvectors or program address space between the memory scrubbing routineand any normal program or interrupt activities. Further, there are nooperating-system specific drivers required to support memory scrubbing.The advantages of this embodiment include enhancing the reliability andand the platform or system independence of the ECC scrub operation.

Each word in RAM memory 102 as shown in FIG. 1 comprises both data bitsand error correction code (ECC) or error check bits. In one embodiment,each word comprises 64 data bits and 8 check bits. Typically, the memorybeing checked for errors is a random access memory (RAM), such as thecomputer system main memory, or input/output buffer memory.Nevertheless, the ECC circuitry and methods described herein isadaptable to any digital memory that can be written on a word-by-wordbasis.

All read from and writes to memory 102 pass through memory and ECCcontroller 101. Whenever a word is written to RAM memory 102, memory andECC controller 101 generates error check bits from the data bitsprovided by the device requesting the write, such as processor 103. Inone embodiment, partial word writes are supported by means of a readmodify write cycle, as is known in the prior art.

Typically memory and ECC controller 101 also provides read and writeaccess to RAM memory 102 to other devices (not shown), such asperipheral device controllers. Typically this is done via a system bus(not shown).

When a word within RAM memory 102 is read, memory and ECC controller 101computes a syndrome based on the values of the data and check bits read.If the syndrome is 0, no error occurred. This is the most prevalentsituation. Occasionally, an error occurs and the word from RAM memory102 has one or more bits reversed. In the case of a correctable error,memory and ECC controller 101 corrects the erroneous word on the fly,that is it provides to the requester a corrected version of the wordrequested.

The present invention makes no attempt to correct the contents of RAMmemory 102 as it is being read. Rather, when memory and ECC controller101 detects a correctable error, it activates correctable error signal122, which signals system management interrupt controller and scheduler105 to initiate a memory scrub operation. This signaling may be done viaa system bus, to which both memory and ECC controller and systemmanagement interrupt controller and scheduler 105 are coupled.

At the appropriate time (there may be higher priority interruptspending), interrupt controller and scheduler 105 generates a systemmanagement interrupt by activating system management interrupt requestsignal 120. Other embodiments of the invention could use the computersystem's non-maskable interrupt mechanism or its regular interruptmechanism.

In response, though not necessarily immediately, processor 103acknowledges the system management interrupt (SMI) request and transferscontrol to a memory scrubbing interrupt service routine that is residentin system management memory 104. System management memory 104 istypically a non-volatile memory, such as a programmable read only memory(PROM) or flash memory. This memory may also contain the computersystem's basic input output system (BIOS).

The memory scrubbing routine reads the contents of section addressregister 130, which is part of memory and ECC controller 101. Sectionaddress register 130 indicates which section of memory needs to bescrubbed. It may or may not complete the scrubbing operation at onetime. If it does not, it activates schedule system management interruptsignal 121. This causes interrupt controller and scheduler 105 toschedules another system management interrupt after a programmabledelay.

In one embodiment, memory and ECC controller 101 is implemented in afirst integrated circuit that also couples processor 101 and RAM memory102 to a high-speed system bus (not shown) that complies with the wellknown peripheral component interconnect (PCI) specification. In thisembodiment, system management interrupt controller and scheduler 105 isimplemented in a second integrated circuit that also couples this PCIbus with a industry standard architecture (ISA) bus. In this embodiment,schedule system management interrupt signal 121 is implemented bywriting specified values to specified control registers within thesecond integrated circuit.

Operation

FIG. 2 is an example timing diagram showing how scrubbing operation 203is distributed in time. Scrubbing operation 203 is active during each oftime periods 206, which are separated by substantial time intervals.Each time period 206 is initiated by a corresponding activation 205 ofsystem management interrupt request signal 120. In the particularexample sequence shown in FIG. 2, as single activation of correctableerror 122 (i.e., a single occurrence of a correctable memory error)results in three activations 205 of system management request signal.

The first activation 205 is generated by system management interruptcontroller and scheduler 105 in response to activation 204 ofcorrectable error signal 122. Correctable error signal 122 is generatedby memory and ECC controller 101.

Each subsequent activation 205 is generated by system managementinterrupt controller and scheduler 105, in response to but after aprogrammable delay from each activation 207 of schedule systemmanagement interrupt signal 121. Schedule system management interruptsignal 121 is generated by processor 103 acting under control of systemmanagement firmware 104.

When system management firmware 104 completes scrubbing the section ofmemory that contains the correctable error, it does not schedule anothersystem management interrupt. Scrubbing operation 203 is not active againuntil memory and ECC controller 101 detects another correctable errorand activates correctable error signal 122.

FIG. 3 is a memory map showing the layout of RAM memory 102 according toone embodiment. In this embodiment, RAM memory 102 is divided into Nsections, numbered 1 to N. To shorten the duration of each of timeperiods 206, each section of RAM memory 102 is divided into foursubsections, denoted "a" through "d".

For example, if a correctable error occurs within Section 2, thensubsection 2a is scrubbed during one time period 206 and another systemmanagement interrupt is scheduled, then subsection 2b is scrubbed duringanother time period 206 and another system management interrupt isscheduled, then subsection 2c is scrubbed during another time period 206and another system management interrupt is scheduled, then subsection 2dis scrubbed during a final time period 206.

The memory map of FIG. 3 also shows, according to one embodiment of theinvention, how the ECC or check bits are interleaved among the databits. Using this particular interleaving and the particular ECC codeshown in FIG. 6, the present invention detects any error that isconfined within a single 4-bit nibble (i.e., bits 0 to 3, 4 to 7, etc.).

RAM memory 102 may be implemented using a series of integrated circuits(ICs) each of which holds one nibble's worth of data for a number ofwords. If one such IC, which may be a single in-lime memory module(SIMM), is missing or defective, then all of bits of that nibble can beerroneous. Because this is a common failure mode, it is desirable to beable to detect that.

The bit order within the code word according to this embodiment is asfollows:

1. Data bits 0 to 25

2. Check bit 2

3. Check bit 5

4. Data bits 26 to 31

5. Check bit 3

6. Check bit 4

7. Data bits 32 to 57

8. Check bit 6

9. Check bit 1

10. Data bits 58 to 63

11. Check bit 7

12. Check bit 0

In another embodiment, the error code of FIG. 6 is used, but the checkbits, if used, are stored in bits 64 to 71 of the memory word. When thecomputer system is initially booted at power on self test (POST) time,then the system BIOS can determine or look up whether or not the systemis ECC capable, i.e. whether bits 64 to 71 are actually present in RAMmemory 102. The BIOS enables or disables memory ECC checkingaccordingly. This embodiment allows the same system design andcomponents to be used both for a lower-cost computer system that doesnot have memory error detection and correction capabilities and ahigher-reliability computer system that does.

FIG. 4 is a flow chart showing the steps within the memory scrubinterrupt service routine. This interrupt handler starts 401 whenprocessor 103 acknowledges an occurrence of a system managementinterrupt. Next, processor 103 in step 402 determines whether the activeinterrupt is a memory scrub interrupt, in which case control passes tostep 404. Otherwise whatever other system management event occurred isserviced in step 403--a system power management event, for example.

Step 404 determines whether this is the first pass, or the firstoccurrence of a system management interrupt request 205 due to aparticular correctable error event 204. If not control passes to step412. If so, control passes to step 405, which reads, from sectionaddress register 130 within memory and ECC controller 101, the addressof the section within RAM memory 102 that contains the word with acorrectable error.

Next, step 406 reads or determines the size of this section. Typically,each section is the same size, but as more memory is added to thecomputer system each section contains more words. Step 408 tests if thesize of this section is less than or equal to a predetermined limit, 8megabytes (MB) in the particular case shown in FIG. 4. If so, then theentire section is scrubbed in step 407, and the system managementservice routine terminates in step 415.

If the size of the section to be scrubbed is greater than the limit,then, in step 409, the first subsection of the memory section containingthe error is scrubbed. Next in step 410, another memory scrub interruptis scheduled to occur after a predetermined delay, and the systemmanagement service routine terminates in step 415. Schedule systemmanagement interrupt signal 121 is used for this scheduling.

In step 412, the next memory subsection is scrubbed. Next, step 413determines whether or not there is another memory subsection to bescrubbed. If not, then the system management service routine terminatesin step 415. If so, then in step 414, another memory scrub interrupt isscheduled to occur after a predetermined delay, and the systemmanagement service routine terminates in step 415.

Independent System Management Firmware and Interrupt Requests

FIG. 5 shows how system management firmware 104 fits in with thehardware, software and other firmware components of an example computersystem in a way that is independent of, and transparent to, operatingsystem 511.

System management interrupt controller and scheduler 105 schedulesinterrupts that activate system management firmware 104. It alsoreceives requests from system management firmware 104 to schedule suchinterrupts to occur after a specified delay.

System management firmware 104 is independent of BIOS firmware 521,though both may reside in the same non-volatile memory device within thecomputer system. System management interrupt request signal 120 isindependent of the interrupt request control signals that communicatebetween peripheral devices 531 and BIOS firmware 521 or software devicedrivers 512.

BIOS firmware 521 in a typical system supports basic input and outputoperations, such as display and keyboard control functions. Peripheraldevices 531 and operating system 511 communicate by means of deviceinterrupts handled by the BIOS and by means of the OS making calls toBIOS routines.

Other input and output operations are supported by device drivers 512.In these cases, applications software 510 and peripheral devices 531communicate with each other by means of interrupts handed by the devicedrivers and device driver calls respectively. Software device drivers512 are used instead of drivers within BIOS firmware 512 in the case ofmore complex peripheral devices such as network interface cards or ofmore advanced operating systems such as Windows NT™ or Windows 95™.

System management firmware 104 performs the memory scrub operation ofthe present invention without interfering in any way with peripheraldevices 531, BIOS firmware 521, device drivers 512 (if used), operatingsystem 511 or applications software 510.

An Example ECC Code and Algorithm

The present invention can be used with a variety of ECC codes, one ofwhich is illustrated in FIG. 6. This particular ECC code started withRao and Fujiwara's description¹ of a method for constructing aSEC-DED-S4ED rotational code that protects 64 data bits with 8 checkbits. This code was augmented with the unused weight-3 column vectors toproduce a code with length 72 that retains the SEC-DED-S4ED androtational properties, and is symmetric.

The first 64 columns of FIG. 6, i.e. those labeled data bits, show theG-matrix of the ECC code used in this embodiment. Each row of theG-matrix shows how to compute, on writing RAM memory 102, thecorresponding check bit. The first 72 columns of FIG. 6, i.e. thoselabeled data bits and check bits, show the H-matrix of the ECC code usedin this embodiment. Each row of the H-matrix shows how to compute, onreading RAM memory 102, the corresponding syndrome bit.

When writing a word into RAM memory 102, memory and ECC controller 101computes the 8 check bits as follows:

For the check bit N, select the N'th row in the G-matrix, where the rowsare numbered 0 to 7. The 64 columns of the G-matrix correspond to the 64bits of the word specified by the device that is requesting the memorywrite operation.

Compute the 1-bit sum (i.e. the modulo-2 sum) of the data bits that aremarked with a 1 in the selected row. That sum is the value of check bitN.

Write the 8 check bits computed above into the memory along with the 64data bits of the word being written.

When reading a word from RAM memory 102, memory and ECC controller 101computes the 8 syndrome bits as follows:

For the syndrome bit N, select the N'th row in the H-matrix, where therows are numbered 0 to 7. The 72 columns of the H-matrix correspond tothe 64 data bits and the 8 check bits of the word addressed by thedevice that is requesting the memory read operation.

Compute the 1-bit sum (i.e. the modulo-2 sum) of the data and check bitsthat are marked with a 1 in the selected row. That sum is the value ofcheck bit N.

Then, memory and ECC controller 101 uses the syndrome to determine if anerror has occurred, and if so what type of error, as follows:

If all syndrome bits are zero, then the memory word is correct as read.

Else, if either nibble of the syndrome (i.e. bits s0 to s3, or bits s4to s7) is non-zero and the other nibble contains three one bits, thensome nibble within the word read contains a three bit or a four biterror.

Else, if the syndrome contains an even number of one bits, then anun-correctable error has occurred (e.g. a double-bit error).

Else, if the syndrome contains an odd number of one bits, then asingle-bit correctable error has occurred.

In the case of a single bit error, memory and ECC controller 101 usesthe syndrome to invert exactly one bit within the word as read, asfollows:

Compare the 8 syndrome bits to the 8 rows of the H-matrix of FIG. 6column by column. The column that they match is the column correspondingto the bit position that was read erroneously. For example, if thesyndrome bits are 0001 0101 (in s0 to s7 order), then the bit 7 of theword was read erroneously.

Invert whatever value was read for the bit position that correspondswith the matching column. In the same example, invert bit 7 of the wordas read to generate the correct word.

In one embodiment, the data transferred over the system bus is protectedfrom errors by using the same ECC code as is used for RAM memory 102. Inthis embodiment, memory and ECC controller 101 performs the abovedescribed syndrome generation and checking (and perhaps errorcorrection) on data words received from the system bus before they arewritten into memory with the same error check bits (or perhaps with thecorrected error check bits corrected, based on the above techniques).

CONCLUSION

As illustrated herein, the present invention provides a novel andadvantageous method and apparatus for correcting errors in RAM memory.One skilled in the art will realize that alternative embodiments, designalternatives and various changes in form and detail may be employedwhile practicing the invention without departing from its principles,spirit or scope.

In particular the system architecture shown in FIG. 1, the controlsignals shown in FIG. 2, the memory map shown in FIG. 3, the steps inthe memory scrub interrupt service routine shown in FIG. 4, thesoftware/firmware/hardware relationships shown in FIG. 5 and the ECCcode shown in FIG. 6 may be simplified, augmented or changed in variousembodiments of the invention.

The following claims indicate the scope of the present invention. Anyvariation which comes within the meaning of, or range of equivalency of,any of these claims is within the scope of the present invention.

What is claimed is:
 1. A computer system with memory error correction,comprisinga memory to store data, the memory comprising a plurality ofwords, each word comprising data bits and error check bits, the memorybeing partitioned into a plurality of sections wherein each word belongsto one section and each section contains a plurality of words; aprocessor operable to read and rewrite each word within an indicatedsection, by reading and re-writing words within a first subsection ofthe indicated section, and then scheduling an interrupt to read andrewrite words within a second subsection of the indicated section; and amemory controller comprising circuitry to determine, in response to aword being read from the memory, if the word being read contains acorrectable error, and if so to interrupt the processor, and to providethe processor with an indication of the section to which the word beingread belongs, the interrupt occurring via an interrupt request signaldistinct from that used for input/output interrupts.
 2. The computersystem according to claim 1, wherein said memory controller is furtheroperable in response to a request to write into the memory to generatethe error check bits based on the data bits being written.
 3. Thecomputer system according to claim 1, wherein said memory controller isfurther operable in response to a request to read from the memory tocorrect an error within any word based on the word's data bits and errorcheck bits as read from the memory and to provide the corrected word inresponse to the request.
 4. The computer system according to claim 1,wherein the indicated section is below a predetermined byte size and thereading an re-writing of words is performed in only the first section.5. The computer system according to claim 1, wherein the processorservices the interrupt using an address space, processor state andregister set distinct from that used during normal processor operationand distinct from that used for input/output interrupts.
 6. A method ofmemory error correction, comprising determining if a word read from amemory contains a correctable error, and if so:i) latching an indicationof a section to which the erroneous word belongs, each word belonging toone of a plurality of sections and each section containing a pluralityof words; ii) interrupting a processor via an interrupt request signaldistinct from that used for input/output interrupts; iii) providing theprocessor with the section indication; and iv) reading and re-writingthe words within a first subsection of the indicated section; schedulingan interrupt; and reading and re-writing the words within a secondsubsection of the indicated section in response to receiving theinterrupt.
 7. The method according to claim 6, furthercomprising:generating error check bits based on the data bits of a wordbeing written into the memory; and storing both the error check bits andthe data bits as the word being written.
 8. The method according toclaim 7, wherein the determining of a correctable error is based on thedata bits and the error check bits read from said memory.
 9. The methodaccording to claim 6, wherein the indicated section is below apredetermined byte size and the reading an re-writing of words isperformed in only the first section.
 10. The method according to claim6, wherein the processor services the interrupt using an address space,processor state and register set distinct from that used during normalprocessor operation and distinct from that used for input/outputinterrupts.
 11. A computer system with memory error correction,comprisinga) a memory means for storing data, the memory meanscomprising a plurality of words, each word comprising data bits anderror check bits, the memory means being partitioned into a plurality ofsections with each word belonging to one of the sections and eachsection containing a plurality of the words; a) a processor means forreading and writing words within the memory means and for re-writingeach word within an indicated one of said memory sections, wherein theprocessor means reads and re-writes the words within a subsection of theindicated section by signaling the interrupt request means tore-interrupt the processor means, and by reading and re-writing, inresponse to the re-interrupt, the words within a second subsection ofthe indicated section; c) an interrupt request means for interruptingthe processor, the interrupt request means being distinct from that usedfor input/output interrupts; and d) a memory controller means forgenerating the error check bits based on the data bits of any word beingwritten into the memory, for determining based on the data bits and theerror check bits if the word accessed by said read contains acorrectable error in response to said processor means reading saidmemory means, and if so for latching an indication of the section towhich said erroneous word belongs, for interrupting said processor meansvia said interrupt request means, and for providing said processor meanswith said section indication.
 12. The method according to claim 10,wherein said processor means for reading and re-writing each word withinthe indicated section uses an address space, processor state andregister set distinct from that used during normal processor operationand distinct from that used for input/output interrupts.